{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n===================================================\nLasso model selection: Cross-Validation / AIC / BIC\n===================================================\n\nUse the Akaike information criterion (AIC), the Bayes Information\ncriterion (BIC) and cross-validation to select an optimal value\nof the regularization parameter alpha of the `lasso` estimator.\n\nResults obtained with LassoLarsIC are based on AIC/BIC criteria.\n\nInformation-criterion based model selection is very fast, but it\nrelies on a proper estimation of degrees of freedom, are\nderived for large samples (asymptotic results) and assume the model\nis correct, i.e. that the data are actually generated by this model.\nThey also tend to break when the problem is badly conditioned\n(more features than samples).\n\nFor cross-validation, we use 20-fold with 2 algorithms to compute the\nLasso path: coordinate descent, as implemented by the LassoCV class, and\nLars (least angle regression) as implemented by the LassoLarsCV class.\nBoth algorithms give roughly the same results. They differ with regards\nto their execution speed and sources of numerical errors.\n\nLars computes a path solution only for each kink in the path. As a\nresult, it is very efficient when there are only of few kinks, which is\nthe case if there are few features or samples. Also, it is able to\ncompute the full path without setting any meta parameter. On the\nopposite, coordinate descent compute the path points on a pre-specified\ngrid (here we use the default). Thus it is more efficient if the number\nof grid points is smaller than the number of kinks in the path. Such a\nstrategy can be interesting if the number of features is really large\nand there are enough samples to select a large amount. In terms of\nnumerical errors, for heavily correlated variables, Lars will accumulate\nmore errors, while the coordinate descent algorithm will only sample the\npath on a grid.\n\nNote how the optimal value of alpha varies for each fold. This\nillustrates why nested-cross validation is necessary when trying to\nevaluate the performance of a method for which a parameter is chosen by\ncross-validation: this choice of parameter may not be optimal for unseen\ndata.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(__doc__)\n\n# Author: Olivier Grisel, Gael Varoquaux, Alexandre Gramfort\n# License: BSD 3 clause\n\nimport time\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn.linear_model import LassoCV, LassoLarsCV, LassoLarsIC\nfrom sklearn import datasets\n\n# This is to avoid division by zero while doing np.log10\nEPSILON = 1e-4\n\nX, y = datasets.load_diabetes(return_X_y=True)\n\nrng = np.random.RandomState(42)\nX = np.c_[X, rng.randn(X.shape[0], 14)]  # add some bad features\n\n# normalize data as done by Lars to allow for comparison\nX /= np.sqrt(np.sum(X ** 2, axis=0))\n\n# #############################################################################\n# LassoLarsIC: least angle regression with BIC/AIC criterion\n\nmodel_bic = LassoLarsIC(criterion='bic')\nt1 = time.time()\nmodel_bic.fit(X, y)\nt_bic = time.time() - t1\nalpha_bic_ = model_bic.alpha_\n\nmodel_aic = LassoLarsIC(criterion='aic')\nmodel_aic.fit(X, y)\nalpha_aic_ = model_aic.alpha_\n\n\ndef plot_ic_criterion(model, name, color):\n    alpha_ = model.alpha_ + EPSILON\n    alphas_ = model.alphas_ + EPSILON\n    criterion_ = model.criterion_\n    plt.plot(-np.log10(alphas_), criterion_, '--', color=color,\n             linewidth=3, label='%s criterion' % name)\n    plt.axvline(-np.log10(alpha_), color=color, linewidth=3,\n                label='alpha: %s estimate' % name)\n    plt.xlabel('-log(alpha)')\n    plt.ylabel('criterion')\n\n\nplt.figure()\nplot_ic_criterion(model_aic, 'AIC', 'b')\nplot_ic_criterion(model_bic, 'BIC', 'r')\nplt.legend()\nplt.title('Information-criterion for model selection (training time %.3fs)'\n          % t_bic)\n\n# #############################################################################\n# LassoCV: coordinate descent\n\n# Compute paths\nprint(\"Computing regularization path using the coordinate descent lasso...\")\nt1 = time.time()\nmodel = LassoCV(cv=20).fit(X, y)\nt_lasso_cv = time.time() - t1\n\n# Display results\nm_log_alphas = -np.log10(model.alphas_ + EPSILON)\n\nplt.figure()\nymin, ymax = 2300, 3800\nplt.plot(m_log_alphas, model.mse_path_, ':')\nplt.plot(m_log_alphas, model.mse_path_.mean(axis=-1), 'k',\n         label='Average across the folds', linewidth=2)\nplt.axvline(-np.log10(model.alpha_ + EPSILON), linestyle='--', color='k',\n            label='alpha: CV estimate')\n\nplt.legend()\n\nplt.xlabel('-log(alpha)')\nplt.ylabel('Mean square error')\nplt.title('Mean square error on each fold: coordinate descent '\n          '(train time: %.2fs)' % t_lasso_cv)\nplt.axis('tight')\nplt.ylim(ymin, ymax)\n\n# #############################################################################\n# LassoLarsCV: least angle regression\n\n# Compute paths\nprint(\"Computing regularization path using the Lars lasso...\")\nt1 = time.time()\nmodel = LassoLarsCV(cv=20).fit(X, y)\nt_lasso_lars_cv = time.time() - t1\n\n# Display results\nm_log_alphas = -np.log10(model.cv_alphas_ + EPSILON)\n\nplt.figure()\nplt.plot(m_log_alphas, model.mse_path_, ':')\nplt.plot(m_log_alphas, model.mse_path_.mean(axis=-1), 'k',\n         label='Average across the folds', linewidth=2)\nplt.axvline(-np.log10(model.alpha_), linestyle='--', color='k',\n            label='alpha CV')\nplt.legend()\n\nplt.xlabel('-log(alpha)')\nplt.ylabel('Mean square error')\nplt.title('Mean square error on each fold: Lars (train time: %.2fs)'\n          % t_lasso_lars_cv)\nplt.axis('tight')\nplt.ylim(ymin, ymax)\n\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}